Data Persistance and Sharing with Ceph#

While Docker Swarm is great for keeping containers running (and restarting those that fail), it does nothing for persistent storage.

This means if you actually want your containers to keep any data persistent across restarts (hint: you do!), you need to provide shared storage to every docker node.

Ceph

Ceph is an open-source software (software-defined storage) storage platform, implements object storage on a single distributed computer cluster, and provides 3-in-1 interfaces for object-, block and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.

There are several ways to install Ceph. Choose the method that best suits your needs. For recommendation on ceph documentation is used cephadm. Cephadm installs and manages a Ceph cluster using containers and systemd, with tight integration with the CLI and dashboard GUI. Note on Cephadm: - Only supports Octopus and newer releases. - Fully integrated with the new orchestration API and fully supports the new CLI and dashboard features to manage cluster deployment. - Requires container support (podman or docker) and Python 3.

Prerequise#

3 x Virtual Machines (configured earlier for lab or dedicated for production), each with:

A mainstream Linux OS (tested on CentOS 8/9 Stream)
Support for "modern" versions of Python and LVM
At least 2GB RAM
At least 50GB disk space (but it'll be tight)
Connectivity to each other within the same subnet, and on a low-latency link (i.e., no WAN links)
At least an additionnal disk dedicated to the Ceph OSD (add it to previous host if needed)
Each node should have the IP of every other participating node define in /etc/hosts (including its own IP)

/etc/hosts#

Add those record to /etc/hosts, edit IP if required.

# Ceph Hosts
10.0.0.41 ceph1
10.0.0.42 ceph2
10.0.0.43 ceph3
# Swarm Hosts

10.0.0.51 swarm1
10.0.0.52 swarm2
10.0.0.53 swarm3

Installing Docker and Docker compose#

### Remove runc
dnf remove runc

### Adding docker repo


dnf install -y dnf-utils
dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
dnf install -y docker-ce docker-ce-cli containerd.io


### Installing docker-compose


dnf install -y python3-pip
pip3 install --upgrade pip
pip3 install setuptools-rust
pip3 install docker-compose

Let's start the whale#

systemctl enable docker --now

Choose your first manager node#

One of your nodes will become the cephadm "master" node. Although all nodes will participate in the Ceph cluster, the master node will be the node which we bootstrap ceph on. It's also the node which will run the Ceph dashboard, and on which future upgrades will be processed. It doesn't matter which node you pick, and the cluster itself will operate in the event of a loss of the master node (although you won't see the dashboard)

Install cephadm on master node#

Run the following on the master node:

RELEASE="quincy"

# Use curl to fetch the most recent version of the standalone script
curl --silent --remote-name --location https://raw.githubusercontent.com/ceph/ceph/$RELEASE/src/cephadm/cephadm

#Make the cephadm script executable:
chmod +x cephadm

# To install the packages that provide the cephadm command, run the following commands:
./cephadm add-repo --release $RELEASE
./cephadm install

#Install Ceph-common and Confirm that cephadm is now in your PATH by running which:
dnf install -y ceph-common
which cephadm

Bootstrap new Ceph cluster#

The first step in creating a new Ceph cluster is running the cephadm bootstrap command on the Ceph cluster’s first host. The act of running the cephadm bootstrap command on the Ceph cluster’s first host creates the Ceph cluster’s first “monitor daemon”, and that monitor daemon needs an IP address. You must pass the IP address of the Ceph cluster’s first host to the ceph bootstrap command, so you’ll need to know the IP address of that host.

MYIP=`ip route get 1.1.1.1 | grep -oP 'src \K\S+'`
mkdir -p /etc/ceph
cephadm bootstrap --mon-ip $MYIP

This command will:

Create a monitor and manager daemon for the new cluster on the local host.

Generate a new SSH key for the Ceph cluster and add it to the root user’s /root/.ssh/authorized_keys file.

Write a copy of the public key to /etc/ceph/ceph.pub.

Write a minimal configuration file to /etc/ceph/ceph.conf. This file is needed to communicate with the new cluster.

Write a copy of the client.admin administrative (privileged!) secret key to /etc/ceph/ceph.client.admin.keyring.

Add the _admin label to the bootstrap host. By default, any host with this label will (also) get a copy of /etc/ceph/ceph.conf and /etc/ceph/ceph.client.admin.keyring.

Check Ceph dashboard, access IP address of ceph01 https://ceph1:8443/ and use credentials from the cephadm bootstrap output then set a new password

Confirm that the ceph command is accessible with:

ceph -v

Result:

 ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)

Check status of ceph cluster, OK for [HEALTH_WARN] because OSDs are not added yet

ceph -s

Result:

 cluster:
     id:     588df728-316c-11ec-b956-005056aea762
     health: HEALTH_WARN
             OSD count 0 < osd_pool_default_size 3
 services:
     mon: 1 daemons, quorum ceph01 (age 14m)
     mgr: ceph01.wgdjcn(active, since 12m)
     osd: 0 osds: 0 up, 0 in
 data:
     pools:   0 pools, 0 pgs
     objects: 0 objects, 0 B
     usage:   0 B used, 0 B / 0 B avail
     pgs:

Verify containers are running for each service and check status for systemd service for each containers

docker ps |grep ceph
systemctl status ceph-* --no-pager

Adding hosts to the cluster.#

To add each new host to the cluster, perform two steps:

# Install the cluster’s public SSH key in the new host’s root user’s
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph2
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph3

#Tell Ceph that the new node is part of the cluster, make sure python3 installed and available on new node
ceph orch host add ceph2
ceph orch host add ceph3

#Check the added host
ceph orch host ls

Result:

HOST     ADDR           LABELS  STATUS
ceph1  10.0.0.41  _admin
ceph2  10.0.0.42
ceph3  10.0.0.43

Deploy OSDs to the cluster#

Run this command to display an inventory of storage devices on all cluster hosts:

ceph orch device ls

Result:

Hostname  Path      Type  Serial  Size   Health   Ident  Fault  Available
ceph1   /dev/sdb  ssd           150G  Unknown  N/A    N/A    Yes
ceph2   /dev/sdb  ssd           150G  Unknown  N/A    N/A    Yes
ceph3   /dev/sdb  ssd           150G  Unknown  N/A    N/A    Yes

Tell Ceph to consume any available and unused storage device execute ceph orch apply osd --all-available-devices

ceph orch apply osd --all-available-devices
ceph -s

Result:

cluster:
    id:     588df728-316c-11ec-b956-005056aea762
    health: HEALTH_OK
services:
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 5m)
    mgr: ceph1.wgdjcn(active, since 41m), standbys: ceph02.rmltzq
    osd: 9 osds: 0 up, 9 in (since 10s)
data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

ceph osd tree

Result:

ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         0.08817  root default
-5         0.02939      host ceph1
 1    ssd  0.00980          osd.0         up   1.00000  1.00000
-7         0.02939      host ceph2
 0    ssd  0.00980          osd.1         up   1.00000  1.00000
-3         0.02939      host ceph3
 2    ssd  0.00980          osd.3        up   1.00000  1.00000

Deploy Ceph-mon (ceph monitor daemon)#

Ceph-mon is the cluster monitor daemon for the Ceph distributed file system. One or more instances of Ceph-mon form a Paxos part-time parliament cluster that provides extremely reliable and durable storage of cluster membership, configuration, and state. Add Ceph-mon to all node using placement option

ceph orch apply mon --placement="ceph1,ceph2,ceph3"
ceph orch ps | grep mon

Result:

mon.ceph1            ceph1               running (63m)     7m ago  63m     209M    2048M  16.2.6   02a72919e474  952d7
mon.ceph2            ceph2               running (27m)     7m ago  27m     104M    2048M  16.2.6   02a72919e474  f2d22
mon.ceph3            ceph3               running (25m)     7m ago  25m     104M    2048M  16.2.6   02a72919e474  bcc00
Result:

Deploy Ceph-mgr (ceph manager daemon)#

The Ceph Manager daemon (Ceph-mgr) runs alongside monitor daemons, to provide additional monitoring and interfaces to external monitoring and management systems.

ceph orch apply mgr --placement="ceph1,ceph2,ceph3"
ceph orch ps | grep mgr

Result:

mgr.ceph1.wgdjcn     ceph1  *:9283       running (64m)     8m ago  64m     465M        -  16.2.6     02a72919e474  c58a64249f9b
mgr.ceph2.rmltzq     ceph2  *:8443,9283  running (29m)     8m ago  29m     385M        -  16.2.6     02a72919e474  36f7f6a02896
mgr.ceph3.lhwjwd     ceph3  *:8443,9283  running (7s)      2s ago   6s     205M        -  16.2.6     02a72919e474  c740f964b2de

Set _admin label on all nodes#

The orchestrator supports assigning labels to hosts. Labels are free form and have no particular meaning by itself and each host can have multiple labels. They can be used to specify placement of daemons. But the _admin force the replication of change on ceph.conf to all node with this tag.

Official note:

By default, a ceph.conf file and a copy of the client.admin keyring are maintained in /etc/ceph on all hosts with the _admin label, which is initially applied only to the bootstrap host. We usually recommend that one or more other hosts be given the _admin label so that the Ceph CLI (e.g., via cephadm shell) is easily accessible on multiple hosts. To add the _admin label to additional host(s)

ceph orch host label add ceph3 _admin
ceph orch host label add ceph3 _admin

Prepare for cephFS mount#

It's now necessary to tranfer the following files to your other nodes, so that cephadm can add them to your cluster, and so that they'll be able to mount the cephfs when we're done:

Path on master	Path on non-master
/etc/ceph/ceph.conf	/etc/ceph/ceph.conf
/etc/ceph/ceph.client.admin.keyring	/etc/ceph/ceph.client.admin.keyring

Setup CephFS#

On the master node, create a cephfs volume in your cluster, by running ceph fs volume create data. Ceph will handle the necessary orchestration itself, creating the necessary pool, mds daemon, etc.

You can watch the progress by running ceph fs ls (to see the fs is configured), and ceph -s to wait for HEALTH_OK

Reproduce the following on each node:

mkdir /mnt/swarm
echo -e "
# Mount cephfs volume \n
ceph1,ceph2,ceph3:/ /mnt/swarm ceph name=admin,noatime,_netdev 0 0" >> /etc/fstab
mount -a

You can now play around and copy delete data on /mnt/swarm and check the replication accross the nodes.

References#

https://docs.ceph.com/en/latest/cephadm/install/